Add an optimization doc on TPU #21155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

bvrockwell wants to merge 26 commits into vllm-project:main from bvrockwell:main

+97 −0

Contributor

bvrockwell commented Jul 18, 2025 •

edited by github-actions bot

Loading

Creating a place in the docs for TPU optimization tips, WIP

bvrockwell and others added 9 commits

July 15, 2025 14:29


          draft

336ce6f


          rename

ecd6308


          add tuner

de7986e


          add information about calculator

19164c8


          updating the docs

a0eedfa


          Merge branch 'vllm-project:main' into main

0585f3b


          update

62e520e


          add link to more docs

caf6377


          Merge branch 'vllm-project:main' into main

9d4a924

bvrockwell requested a review from hmellor as a code owner

July 18, 2025 00:52

github-actions bot commented Jul 18, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify bot added documentation tpu labels

gemini-code-assist bot reviewed

View reviewed changes

Contributor

gemini-code-assist bot left a comment

Code Review

This pull request adds a new documentation page with optimization tips for using vLLM on TPUs. The content is valuable and well-structured. I've identified a few issues, mainly related to broken or fragile links and a missing image, which could impact the user experience. I've also found a typo in a command-line argument that could cause issues for users who copy-paste it. Please see my detailed comments.

docs/configuration/tpu/README.md Outdated Show resolved Hide resolved

docs/configuration/tpu/README.md Outdated Show resolved Hide resolved

docs/configuration/tpu/README.md Outdated Show resolved Hide resolved

docs/configuration/tpu/README.md Outdated Show resolved Hide resolved

docs/configuration/tpu/README.md Outdated Show resolved Hide resolved

bvrockwell marked this pull request as draft

July 18, 2025 00:53

bvrockwell and others added 11 commits

July 17, 2025 17:56


          Update docs/configuration/tpu/README.md

a764418

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>


          expand on certain topics

415de9c


          clean up

95c0174


          Merge branch 'vllm-project:main' into main

bcd39b2


          Update docs/configuration/tpu/README.md

ae71890

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>


          update

d6d6127


          update

e24df03


          update calculator URL

cd36e2a


          update image path

e3f9818


          update

8527af4


          fix image

661debe

bvrockwell marked this pull request as ready for review

July 23, 2025 19:34

Contributor Author

bvrockwell commented Jul 23, 2025

@Chenyaaang @yaochengji ptal


          Merge branch 'vllm-project:main' into main

Chenyaaang reviewed

View reviewed changes

docs/configuration/tpu/README.md Outdated Show resolved Hide resolved

yaochengji reviewed

View reviewed changes

Collaborator

yaochengji left a comment

Thanks for the awesome documentation! Left a few comments!

docs/configuration/tpu/README.md Outdated Show resolved Hide resolved

docs/configuration/tpu/README.md Outdated Show resolved Hide resolved

docs/configuration/tpu/README.md Outdated

    
              Although the first compilation can take some time, for all subsequent server launches, vLLM can load these graphs directly from the cache, eliminating the compilation time for future runs. 

              Use `VLLM_XLA_CACHE_PATH` environment variable to write to shareable storage for future launches.

Collaborator

yaochengji Jul 23, 2025

Do we need to mention that there's actually a issue in persistent compilation cache?

docs/configuration/tpu/README.md Outdated Show resolved Hide resolved


          updates

6f581e8

hmellor requested changes

View reviewed changes

Member

hmellor left a comment •

edited

Loading

Thanks for the docs contribution!

I have some general style comments initially:

Please fix the pre-commit and DCO checks
Please avoid using bold text in the headings
Some of the headings are quite long might not render how it was intended
In the navigation bar this page is nested in a TPU directory and you only see the title of the page if you try and look inside it

docs/configuration/tpu/README.md Outdated

Member

hmellor Jul 24, 2025

Why docs/configuration/tpu/README.md instead of docs/configuration/tpu.md

docs/configuration/tpu/README.md Outdated

    
              #### **SPMD**

              More details to come.

              #### Want us to cover something that isn't listed here? Open up an issue please and cite this doc. We'd love to hear your questions or tips.

Member

hmellor Jul 24, 2025

This should't be a heading, the whole thing appears in the ToC

docs/configuration/tpu/README.md Outdated

    
              ### **Get started**

              Looking for setup and installation instructions? Find them [here](https://docs.vllm.ai/en/latest/getting_started/installation/google_tpu.html).

Member

hmellor Jul 24, 2025

Please use relative md links for internal docs references (http links will break in future releaes)

docs/configuration/tpu/README.md Outdated

    
              #### **Profiling**

              The auto-tuner provides a profile of optimized configurations as its final step. However, interpreting this profile can be challenging for new users. We plan to expand this section in the future with more detailed guidance. In the meantime, you can learn how to collect a TPU profile using vLLM's native profiling tools [here](https://docs.vllm.ai/en/latest/examples/offline_inference/profiling_tpu.html). This profile can provide valuable insights into your workload's performance.

Member

hmellor Jul 24, 2025

Please use relative md links for internal docs references (http links will break in future releaes)

docs/configuration/tpu/README.md Outdated

Comment on lines 74 to 79

    
              ### **If possible, use the precision that matches the chip’s hardware acceleration**

              - v5e has int4/int8 hardware acceleration in the MXU

              - v6e has int4/int8 hardware acceleration in the MXU

              ### **Don't set TP to be less than the number of chips on a single-host deployment**

Member

hmellor Jul 24, 2025

These headings are also quite long

bvrockwell added 4 commits

July 25, 2025 11:01


          clean up

84d4e0c


          updating

f3cc0ee

fix

b58940d


          reword

4194f12

bvrockwell requested review from hmellor, Chenyaaang and yaochengji

July 25, 2025 18:16

yaochengji approved these changes

View reviewed changes

Collaborator

yaochengji left a comment

LGTM!
Could you fix the pre-commit and DCO? https://docs.vllm.ai/en/latest/contributing/index.html#code-quality

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation tpu